A Hierarchical Word Sequence Language Model

نویسندگان

  • Xiaoyi Wu
  • Yuji Matsumoto
چکیده

Most language models used for natural language processing are continuous. However, the assumption of such kind of models is too simple to cope with data sparsity problem. Although many useful smoothing techniques are developed to estimate these unseen sequences, it is still important to make full use of contextual information in training data. In this paper, we propose a hierarchical word sequence language model to relieve the data sparsity problem. Experiments verified the effectiveness of our model.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Improved Hierarchical Word Sequence Language Model Using Directional Information

For relieving data sparsity problem, Hierarchical Word Sequence (abbreviated as HWS) language model, which uses word frequency information to convert raw sentences into special n-gram sequences, can be viewed as an effective alternative to normal n-gram method. In this paper, we use directional information to make HWS models more syntactically appropriate so that higher performance can be achie...

متن کامل

Written word recognition by the elementary and advanced level Persian-English bilinguals

According  to  a  basic  prediction  made  by  the  Revised  Hierarchical  Model  (RHM),  at  early  stages  of language  acquisition,  strong  L2-L1  lexical  links  are  formed.  RHM  predicts  that  these  links  weaken with  increasing  proficiency,  although  they  do  not  disappear  even  at  higher  levels  of  language development. To test this prediction, two groups of highly proficie...

متن کامل

A Generalized Framework for Hierarchical Word Sequence Language Model

Language modeling is a fundamental research problem that has wide application for many NLP tasks. For estimating probabilities of natural language sentences,most research on language modeling use n-gram based approaches to factor sentence probabilities. However, the assumption under n-grammodels is not robust enough to cope with the data sparseness problem, which affects the final performance o...

متن کامل

A hierarchical Dirichlet language model

We discuss a hierarchical probabilistic model whose predictions are similar to those of the popular language modelling procedure known as 'smoothing'. A number of interesting differences from smoothing emerge. The insights gained from a probabilistic view of this problem point towards new directions for language modelling. The ideas of this paper are also applicable to other problems such as th...

متن کامل

Investigation on language modelling approaches for open vocabulary speech recognition

By definition, words that are not present in a recognition vocabulary are called out-of-vocabulary (OOV) words. Recognition of unseen or new words is an important feature that is always desired in any real-world large vocabulary continuous speech recognition (LVCSR) system. However, human languages are complex in nature due to wide varieties of morphological richness such as inflections, deriva...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014